By Elisabeth Cadman
Vancouver is a lush, green city with abundant trees lining the streets in all its neighbourhoods. But do some neighbourhoods have vastly more trees than other neighbourhoods? And what different varieties of trees can be found in each neighbourhood? I want to know what kind of trees I'm likely to find when I'm walking through Vancouver, so let's see what we can find out.
The data used in this analysis is from the City of Vancouver Vancouver Street Trees dataset. The dataset used in this analysis was prepared by the developers/instructors of the UBC Extended Learning Data Visualization Course. The dataset can be found here.
from hashlib import sha1
import altair as alt
alt.data_transformers.enable('default', max_rows=1000000)
import pandas as pd
import numpy as np
import json
url = 'https://raw.githubusercontent.com/UBC-MDS/data_viz_wrangled/main/data/Trees_data_sets/small_unique_vancouver.csv'
trees_df = pd.read_csv(url, parse_dates=['date_planted'])
trees_df.info
<bound method DataFrame.info of Unnamed: 0 std_street on_street species_name \
0 10747 W 20TH AV W 20TH AV PLATANOIDES
1 12573 W 18TH AV W 18TH AV CALLERYANA
2 29676 ROSS ST ROSS ST NIGRA
3 8856 DOMAN ST DOMAN ST AMERICANA
4 21098 EAST BOULEVARD EAST BOULEVARD HIPPOCASTANUM
... ... ... ... ...
4995 6132 E 53RD AV E 53RD AV SERRULATA
4996 5642 E 32ND AV E 32ND AV XX
4997 8777 DAWSON ST DAWSON ST TULIPIFERA
4998 23489 E 13TH AV E 13TH AV INVOLUCRATA
4999 7450 CULLODEN ST CULLODEN ST CAMPESTRE
neighbourhood_name date_planted diameter street_side_name \
0 Riley Park 2000-02-23 28.5 EVEN
1 Arbutus-Ridge 1992-02-04 6.0 ODD
2 Sunset NaT 12.0 ODD
3 Killarney 1999-11-12 11.0 EVEN
4 Shaughnessy NaT 15.5 ODD
... ... ... ... ...
4995 Victoria-Fraserview NaT 17.0 EVEN
4996 Kensington-Cedar Cottage 2014-01-14 3.0 EVEN
4997 Killarney 2002-04-15 3.5 EVEN
4998 Mount Pleasant 2003-12-02 5.5 EVEN
4999 Kensington-Cedar Cottage NaT 3.0 ODD
genus_name assigned ... plant_area curb tree_id \
0 ACER N ... 15 Y 21421
1 PYRUS N ... 7 Y 129645
2 PINUS N ... 7 Y 154675
3 FRAXINUS N ... 7 Y 180803
4 AESCULUS Y ... N Y 74364
... ... ... ... ... ... ...
4995 PRUNUS N ... 9 Y 47059
4996 CORNUS N ... 10 N 247874
4997 LIRIODENDRON N ... 7 Y 192642
4998 DAVIDIA N ... 5 Y 202500
4999 ACER N ... 8 Y 259433
common_name height_range_id on_street_block \
0 NORWAY MAPLE 4 0
1 CHANTICLEER PEAR 2 2300
2 AUSTRIAN PINE 4 7800
3 AUTUMN APPLAUSE ASH 4 6900
4 COMMON HORSECHESTNUT 4 5200
... ... ... ...
4995 KWANZAN FLOWERING CHERRY 2 2200
4996 EDDIES WHITE WONDER DOGWOOD 1 1700
4997 ARNOLD TULIPTREE 2 6500
4998 DOVE OR HANDKERCHIEF TREE 1 300
4999 RED SHINE MAPLE 1 4500
cultivar_name root_barrier latitude longitude
0 NaN N 49.252711 -123.106323
1 CHANTICLEER N 49.256350 -123.158709
2 NaN N 49.213486 -123.083254
3 AUTUMN APPLAUSE N 49.220839 -123.036721
4 NaN N 49.238514 -123.154958
... ... ... ... ...
4995 KWANZAN N 49.221161 -123.061023
4996 EDDIE'S WHITE WONDER N 49.241544 -123.070644
4997 ARNOLD N 49.224511 -123.048723
4998 NaN Y 49.259208 -123.096905
4999 RED SHINE N 49.243772 -123.078967
[5000 rows x 21 columns]>
trees_df.head()
| Unnamed: 0 | std_street | on_street | species_name | neighbourhood_name | date_planted | diameter | street_side_name | genus_name | assigned | ... | plant_area | curb | tree_id | common_name | height_range_id | on_street_block | cultivar_name | root_barrier | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10747 | W 20TH AV | W 20TH AV | PLATANOIDES | Riley Park | 2000-02-23 | 28.5 | EVEN | ACER | N | ... | 15 | Y | 21421 | NORWAY MAPLE | 4 | 0 | NaN | N | 49.252711 | -123.106323 |
| 1 | 12573 | W 18TH AV | W 18TH AV | CALLERYANA | Arbutus-Ridge | 1992-02-04 | 6.0 | ODD | PYRUS | N | ... | 7 | Y | 129645 | CHANTICLEER PEAR | 2 | 2300 | CHANTICLEER | N | 49.256350 | -123.158709 |
| 2 | 29676 | ROSS ST | ROSS ST | NIGRA | Sunset | NaT | 12.0 | ODD | PINUS | N | ... | 7 | Y | 154675 | AUSTRIAN PINE | 4 | 7800 | NaN | N | 49.213486 | -123.083254 |
| 3 | 8856 | DOMAN ST | DOMAN ST | AMERICANA | Killarney | 1999-11-12 | 11.0 | EVEN | FRAXINUS | N | ... | 7 | Y | 180803 | AUTUMN APPLAUSE ASH | 4 | 6900 | AUTUMN APPLAUSE | N | 49.220839 | -123.036721 |
| 4 | 21098 | EAST BOULEVARD | EAST BOULEVARD | HIPPOCASTANUM | Shaughnessy | NaT | 15.5 | ODD | AESCULUS | Y | ... | N | Y | 74364 | COMMON HORSECHESTNUT | 4 | 5200 | NaN | N | 49.238514 | -123.154958 |
5 rows × 21 columns
trees_df.describe()
| Unnamed: 0 | diameter | civic_number | tree_id | height_range_id | on_street_block | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|
| count | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5000.00000 | 5000.000000 | 5000.000000 | 5000.000000 |
| mean | 14861.920400 | 12.340888 | 2975.707600 | 128682.584600 | 2.73440 | 2960.227000 | 49.247349 | -123.107128 |
| std | 8680.023278 | 9.266600 | 2078.580429 | 75412.260406 | 1.56957 | 2086.861052 | 0.021251 | 0.049137 |
| min | 2.000000 | 0.000000 | 2.000000 | 36.000000 | 0.00000 | 0.000000 | 49.202783 | -123.220560 |
| 25% | 7192.750000 | 4.000000 | 1300.500000 | 61321.500000 | 2.00000 | 1300.000000 | 49.230152 | -123.144178 |
| 50% | 14870.000000 | 10.000000 | 2639.000000 | 130130.500000 | 2.00000 | 2600.000000 | 49.247981 | -123.105861 |
| 75% | 22366.750000 | 18.000000 | 4123.000000 | 191332.000000 | 4.00000 | 4100.000000 | 49.263275 | -123.063484 |
| max | 29992.000000 | 71.000000 | 9113.000000 | 270750.000000 | 9.00000 | 9100.000000 | 49.293930 | -123.023311 |
trees_df.describe(exclude='number')
/var/folders/xt/gs28_l253b16nn1nj03rtwxw0000gn/T/ipykernel_24694/1239690617.py:1: FutureWarning: Treating datetime data as categorical rather than numeric in `.describe` is deprecated and will be removed in a future version of pandas. Specify `datetime_is_numeric=True` to silence this warning and adopt the future behavior now. trees_df.describe(exclude='number')
| std_street | on_street | species_name | neighbourhood_name | date_planted | street_side_name | genus_name | assigned | plant_area | curb | common_name | cultivar_name | root_barrier | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 5000 | 5000 | 5000 | 5000 | 2363 | 5000 | 5000 | 5000 | 4950 | 5000 | 5000 | 2658 | 5000 |
| unique | 603 | 607 | 171 | 22 | 1599 | 4 | 67 | 2 | 38 | 2 | 361 | 176 | 2 |
| top | W 13TH AV | CAMBIE ST | SERRULATA | Renfrew-Collingwood | 2004-02-16 00:00:00 | ODD | ACER | N | 10 | Y | KWANZAN FLOWERING CHERRY | KWANZAN | N |
| freq | 52 | 49 | 463 | 384 | 7 | 2554 | 1218 | 4564 | 736 | 4593 | 383 | 383 | 4679 |
| first | NaN | NaN | NaN | NaN | 1989-10-31 00:00:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| last | NaN | NaN | NaN | NaN | 2019-05-07 00:00:00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
genus_count = trees_df.genus_name.value_counts().rename_axis('genus_name').reset_index(name='count')
genus_count
| genus_name | count | |
|---|---|---|
| 0 | ACER | 1218 |
| 1 | PRUNUS | 1050 |
| 2 | FRAXINUS | 238 |
| 3 | TILIA | 238 |
| 4 | QUERCUS | 218 |
| ... | ... | ... |
| 62 | NOTHOFAGUS | 1 |
| 63 | ARAUCARIA | 1 |
| 64 | SOPHORA | 1 |
| 65 | PTELEA | 1 |
| 66 | CLADRASTIS | 1 |
67 rows × 2 columns
Now I can see that the dataframe includes 22 different neighbourhoods and 67 different genera, which are further broken down into species and common varieties. I'm going to look at which genera of trees are in which neighbourhoods.
# base plot for genus counts
genus_count_plot = alt.Chart(genus_count).mark_bar(color='green').encode(
alt.X('genus_name:N', sort='-y', title='Genus'),
alt.Y('count:Q', title='Number of Trees'),
tooltip=[alt.Tooltip('count:Q', title='Quantity')]
).properties(title='Quantity of Trees in Each Genus', height=250)
genus_count_plot
# base plot for tree quantities in neighbourhoods
neighbourhood_trees_bar=alt.Chart(trees_df).mark_bar(color='darkblue').encode(
alt.X('count()', title='Number of Trees'),
alt.Y('neighbourhood_name', title='Neighbourhood'),
tooltip=[alt.Tooltip('count()', title='Quantity')]
).properties(title={'text' : 'Number of Trees in Each Neighbourhood', 'subtitle' : 'Click on a bar to select the neighbourhood. Double-click to clear.'})
neighbourhood_trees_bar
# add click selection to neighbourhood plot
select_neighbourhood_click = alt.selection_single(encodings=['y'], on='click')
select_neighbourhood = (neighbourhood_trees_bar.encode(
opacity=alt.condition(select_neighbourhood_click, alt.value(1), alt.value(0.2)))
.add_selection(select_neighbourhood_click)).properties(height=300, width=200)
select_neighbourhood
#combine plots and make them interactive
genus_per_neighbourhood = alt.Chart(trees_df).transform_filter(select_neighbourhood_click).mark_bar(color='green').encode(
alt.X('genus_name', sort='-y', title='Genus'),
alt.Y('count:Q', title='Number of Trees'),
tooltip=[alt.Tooltip('count:Q', title='Quantity')]
).transform_aggregate(count='count()',groupby=['genus_name']
).transform_window(rank='rank(count())',sort=[alt.SortField('count()', order='descending')]
).add_selection(select_neighbourhood_click).properties(title='Quantity of Trees in Each Genus')
combo_plot = select_neighbourhood | genus_per_neighbourhood
combo_plot
This plot answers my first question of which trees are present in which neighbourhoods. Now I'm curious about the size of trees in each neighbourhood.
# exploring size distribution in tree genera
genus_diameter_boxplot = alt.Chart(trees_df).mark_boxplot().encode(
alt.X('diameter'),
alt.Y('genus_name'))
genus_diameter_boxplot
# finding max diameter tree in each neighbourhood
neighbourhood_max = trees_df.groupby('neighbourhood_name').max().reset_index().rename(columns={'neighbourhood_name':'neighbourhood_name'})[['neighbourhood_name', 'diameter']]
neighbourhood_max
/var/folders/xt/gs28_l253b16nn1nj03rtwxw0000gn/T/ipykernel_24694/3577311502.py:1: FutureWarning: Dropping invalid columns in DataFrameGroupBy.max is deprecated. In a future version, a TypeError will be raised. Before calling .max, select only columns which should be valid for the function.
neighbourhood_max = trees_df.groupby('neighbourhood_name').max().reset_index().rename(columns={'neighbourhood_name':'neighbourhood_name'})[['neighbourhood_name', 'diameter']]
| neighbourhood_name | diameter | |
|---|---|---|
| 0 | Arbutus-Ridge | 48.0 |
| 1 | Downtown | 28.0 |
| 2 | Dunbar-Southlands | 49.5 |
| 3 | Fairview | 46.0 |
| 4 | Grandview-Woodland | 51.0 |
| 5 | Hastings-Sunrise | 41.0 |
| 6 | Kensington-Cedar Cottage | 41.5 |
| 7 | Kerrisdale | 57.0 |
| 8 | Killarney | 40.0 |
| 9 | Kitsilano | 71.0 |
| 10 | Marpole | 56.0 |
| 11 | Mount Pleasant | 37.5 |
| 12 | Oakridge | 38.0 |
| 13 | Renfrew-Collingwood | 38.5 |
| 14 | Riley Park | 40.0 |
| 15 | Shaughnessy | 71.0 |
| 16 | South Cambie | 40.0 |
| 17 | Strathcona | 34.0 |
| 18 | Sunset | 45.0 |
| 19 | Victoria-Fraserview | 40.0 |
| 20 | West End | 41.5 |
| 21 | West Point Grey | 46.0 |
neighbourhood_max_plot = alt.Chart(neighbourhood_max).mark_circle(color='darkblue', size=75).encode(
alt.X('neighbourhood_name', title='Neighbourhood'),
alt.Y('diameter', title='Diameter'),
tooltip=('neighbourhood_name', 'diameter'))
neighbourhood_max_plot
neighbourhood_mean = trees_df.groupby('neighbourhood_name').mean().reset_index().rename(columns={'neighbourhood_name':'neighbourhood_name'})[['neighbourhood_name', 'diameter']]
neighbourhood_mean
| neighbourhood_name | diameter | |
|---|---|---|
| 0 | Arbutus-Ridge | 12.598571 |
| 1 | Downtown | 7.480117 |
| 2 | Dunbar-Southlands | 16.078115 |
| 3 | Fairview | 13.910821 |
| 4 | Grandview-Woodland | 12.603627 |
| 5 | Hastings-Sunrise | 12.185441 |
| 6 | Kensington-Cedar Cottage | 12.005600 |
| 7 | Kerrisdale | 13.904960 |
| 8 | Killarney | 10.030000 |
| 9 | Kitsilano | 15.080855 |
| 10 | Marpole | 12.419492 |
| 11 | Mount Pleasant | 13.401759 |
| 12 | Oakridge | 10.236263 |
| 13 | Renfrew-Collingwood | 10.308724 |
| 14 | Riley Park | 12.676829 |
| 15 | Shaughnessy | 14.162611 |
| 16 | South Cambie | 12.402542 |
| 17 | Strathcona | 12.447333 |
| 18 | Sunset | 11.147249 |
| 19 | Victoria-Fraserview | 10.456678 |
| 20 | West End | 12.842520 |
| 21 | West Point Grey | 13.256250 |
neighbourhood_mean_plot = alt.Chart(neighbourhood_mean).mark_circle(color='darkblue', size=75).encode(
alt.X('neighbourhood_name', title='Neighbourhood'),
alt.Y('diameter', title='Diameter'),
tooltip=('neighbourhood_name', 'diameter'))
neighbourhood_mean_plot
I'm going to include plots of both the mean tree diameters and the max tree diameters in my dashboard to get a more comprehensive view of the tree sizes.
neighbourhood_mean_plot = alt.Chart(neighbourhood_mean).mark_circle(color='orange', size=75).encode(
alt.X('neighbourhood_name', title='Neighbourhood'),
alt.Y('diameter', title='Diameter (in)'),
tooltip=('neighbourhood_name', 'diameter'),
opacity=alt.condition(select_neighbourhood_click, alt.value(0.9), alt.value(0.2))
).add_selection(select_neighbourhood_click).properties(title='Average Diameter of Trees in Each Neighbourhood')
neighbourhood_mean_plot
neighbourhood_max_plot = alt.Chart(neighbourhood_max).mark_circle(color='maroon', size=75).encode(
alt.X('neighbourhood_name', title='Neighbourhood'),
alt.Y('diameter', title='Diameter (in)'),
tooltip=('neighbourhood_name', 'diameter'),
opacity=alt.condition(select_neighbourhood_click, alt.value(0.9), alt.value(0.2))
).add_selection(select_neighbourhood_click).properties(title='Largest Tree Diameter in Each Neighbourhood')
neighbourhood_max_plot
dashboard = (select_neighbourhood | genus_per_neighbourhood) & (neighbourhood_mean_plot | neighbourhood_max_plot)
dashboard
There is a lot to learn about the trees of Vancouver's streets! I've learned that there's a huge range in the number of trees present in the different neighbourhoods - only 75 in Strathcona, vs 384 in Renfrew-Collingwood.
Another observation is that two genera of trees, acer and prunus, are the two most abundant genera in every neighbourhood, except one, Downtown, where the two most abundant genera are acer and fagus. Downtown also has the smallest average tree diameter, and smallest maximum tree diameter of all the neighbourhoods, so a future route of inquiry could be whether prunus trees tend to have large diameters, and the lack of prunus trees Downtown contributes to the smaller average tree diameter in that neighbourhood.
Interestingly, there doesn't appear to be a strong correlation between the number of trees in a neighbourhood and the average/maximum tree diameter. Renfrew-Collingwood, which has the most trees, contains smaller average tree diameters and a relatively small maximum diameter tree. My initial thought was that the neighbourhood with the most trees would also be likely to have some large trees, but this isn't the case. Perhaps the more trees a neighbourhood has, the smaller those trees tend to be in order to fit more trees in a certain area.
In future inquiries, I would like to look further into the sizes of trees of different genera, and perhaps even into whether the species of trees within a genera tend to vary widely in size.